IRISA participation to BioNLP-ST13: lazy-learning and information retrieval for information extraction tasks

نویسنده

  • Vincent Claveau
چکیده

This paper describes the information extraction techniques developed in the framework of the participation of IRISATexMex to the following BioNLP-ST13 tasks: Bacterial Biotope subtasks 1 and 2, and Graph Regulation Network. The approaches developed are general-purpose ones and do not rely on specialized preprocessing, nor specialized external data, and they are expected to work independently of the domain of the texts processed. They are classically based on machine learning techniques, but we put the emphasis on the use of similarity measures inherited from the information retrieval domain (Okapi-BM25 (Robertson et al., 1998), language modeling (Hiemstra, 1998)). Through the good results obtained for these tasks, we show that these simple settings are competitive provided that the representation and similarity chosen are well suited for the task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Participation de l'IRISA à DeFT2012 : recherche d'information et apprentissage pour la génération de mots-clés (IRISA participation to DeFT2012: information retrieval and machine-learning for keyword generation) [in French]

IRISA participation to DeFT 2012 : information retrieval and machine learning for keyword generation This paper describes the IRISA participation to the DeFT 2012 text-mining challenge. It consisted in the automatic attribution or generation of keywords to scientific journal articles. Two tasks were proposed which led us to test two different strategies. For the first task, a list of keywords w...

متن کامل

IRISA Participation in JRS 2012 Data-Mining Challenge: Lazy-Learning with Vectorization

In this article, we report on our participation in the JRS Data-Mining Challenge. The approach used by our system is a lazylearning one, based on a simple k-nearest-neighbors technique. We more specifically addressed this challenge as an opportunity to test Information Retrieval (IR) inspired techniques in such a data-mining framework. In particular, we tested different similarity measures, inc...

متن کامل

Overview of BioNLP Shared Task 2011

The BioNLP Shared Task 2011, an information extraction task held over 6 months up to March 2011, met with community-wide participation, receiving 46 final submissions from 24 teams. Five main tasks and three supporting tasks were arranged, and their results show advances in the state of the art in fine-grained biomedical domain information extraction and demonstrate that extraction methods succ...

متن کامل

c○2011 The Association for Computational Linguistics

The BioNLP Shared Task 2011, an information extraction task held over 6 months up to March 2011, met with community-wide participation, receiving 46 final submissions from 24 teams. Five main tasks and three supporting tasks were arranged, and their results show advances in the state of the art in fine-grained biomedical domain information extraction and demonstrate that extraction methods succ...

متن کامل

Self-training and co-training in biomedical word sense disambiguation

Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, attempting to select the proper sense of ambiguous words. Due to the scarcity of training data, semi-supervised learning, which profits from seed annotated examples and a large set of unlabeled data, are worth researching. We present preliminary results of two semi-supervised learnin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013